ATI Cypress Graphics Engine
Instructions are fed into Cypress's graphics engine, which is what AMD calls the front end of this new GPU. It encompasses the setup engine, command processor and the ultra-threaded dispatch processor. There are some notable changes to this portion of Cypress, with some additions and omissions.
Keeping with the double-up theme, AMD has added a second rasteriser and hierarchical-Z processor to ensure the rest of the GPU can keep up with the kind of workloads it is given. The last thing AMD wanted was for the GPU to not be able to convert complex wireframe meshes into pixels fast enough, and have portions of that massive bank of stream processors left idle.
DirectX 11 introduces multi-threaded rendering for the first time, but it wasn't clear whether the dual rasterisers required Microsoft's latest API to function correctly. When I enquired about this, AMD said that both rasterisers are accessible all of the time. AMD achieved this by modifying the set-up engine, geometry dispatch unit and ultra-threaded dispatch processor so that the two rasterisers work on two 16-pixel scan converters that each focus on non-overlapping pixels.
In other words, it sounds as if the GPU can use both rasterisers no matter the game or DX version it uses – we’ll have to wait until DX11 is released and we have some DX11 games to verify this though.
Microsoft's new API also introduces Tessellation for the first time, but AMD's previous tessellation units aren't compliant with DX11's requirements. As a result, AMD has rolled out a new tessellation unit which complies with Direct3D 11, and handles popular tessellation algorithms such as Catmull-Clark in a single cycle. There are also new algorithms available that help to reduce the artifacting that was apparent in some of the previous generation's tessellation units.
The tessellation unit itself is a fixed-function unit in accordance with DX11, but can adjust the level of geometric detail on the fly. Watching this is impressive, and it's a big step forwards for game developers, who used to create multiple subdivided models to control the different levels of detail (LOD) required in a scene. Tessellation enables them to create just one subdivided model and let the GPU handle LOD on the fly, which is both easier to program and looks better. Programming the tessellator is handled using Hull and Domain shaders, which are new types of shader instruction introduced in HLSL 5.0 (High-Level Shader Language 5.0).
The other notable change is that the interpolators have disappeared. In RV770, these were fixed function units located in the set-up engine. According to AMD, they were a performance-limiting part in RV770 in some scenarios. Interpolation is now handled by the processing cores of the GPU (aka stream processor clusters). Because of the flexible nature of a core rather than a fixed-function sub-unit, per-pixel interpolation is now possible using a DX11 feature called pull-model interpolation.
The instruction works by passing the generic per-pixel weights from the set-up engine into the pixel shader before per-vertex attributes are pulled in as required. The pixel shader then executes per-pixel interpolation using ALU instructions, and the values derived above. This will ultimately give developers more control over interpolation and, as a result, over both texture and shader filtering as well. What's more, the new method of interpolation doesn't just add per-pixel interpolation, as the move to core-processing (rather than fixed-function units) should also remove the previous bottlenecks in linear interpolation operations.
Want to comment? Please log in.